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Summary 


1.  Research  Objectives 

Although  there  has  recently  been  progress  in  research  on  nonlinear  problems,  there  is  no 
commercially  available  software  package  for  dealing  with  them.  The  objective  of  Phase  I  in  this 
project  was  to  write  and  test  a  preliminary  code  to  show  the  feasibility  of  a  general  purpose  nonlinear 
solver.  In  doing  so,  we  concentrated  on:  ' 


1)  Considering  appropriate  simple  techniques  for  handling  nonlinearities^ 

2)  Choosing  the  most  appropriate  linear  system  packages  to  incorporate  in  the  package^ 

3}  Improving  and  incorporating  more  sophisticated  algorithms  for  handling  nonlinearities. 

We  emphasized  the  first  and  third  objectives.  For  the  second  objective  we  have  also  done  some  tests 
with  PCGPAIv,  the  Preconditioned  Conjugate  Gradient  Package  of  subroutines  for  the  iterative 
solution  of  large,  sparse,  nonsymmetric  systems  of  linear  equations,  a  package  implemented  by 
Scientific  Computing  Associates. ' 

We  have  developed  a  preliminary  version  of  a  general  purpose  co^e  based  on  a  nonlinear  version 
of  GMRES,  a  method  for  solving  nonsymmetric  linear  systems'1!*^  While  studying  the  feasibility 
of  a  general  purpose  nonlinear  solver,  we  decided  that  such  a  package  should  be  more  general 
than  that  which  we  originally  proposed.  That  is,  such  a  package  should  not  only  solve  nonlinear 
equations,  but  also  should  be  able  to  accelerate  already  existing  packages.  The  reason  is  that  there 
are  a  host  of  specialized  codes  that  have  taken  years  to  develop  by  engineers  and  scientists;  it  seems 
inefficient  to  replace  these  codes  entirely.  Our  general  purpose  nonlinear  solver  can  take  the  outer 
loops  of  these  codes  and  accelerate  them.>We  explain  this  promising  approach  in  Section  2.2. 

Using  the  Jacobian-free  approach,  we  were  able  to  solve  relatively  hard  problems  in  a  short 
period  of  time  and  with  a  minimum  of  programming  effort.  As  we  stressed  in  our  initial  proposal, 
when  combined  with  a  method  like  GMRES' this  approach  has  the  enormous  advantage  of  being 
easy  to  implement  because  it  requires  no  manipulation  of  the  Jacobian  matrix  in  any  form.  Yet  it 
is  just  as  powerful  as  Newton’s  method  since  it  is  in  effect  an  inexact  Newton  method.  This  makes 
it  possible  to  develop  a  general  purpose  accelerator.  For  example,  consider  an  iteration  of  the  form 
irn+i  =  G(u,  1).  The  purpose  of  this  iterative  scheme  is  to  attempt  to  solve  the  equation 
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Scientific  Computing  Associates 

Solving  the  above  equation  by  a  Newton-type  iteration  would  be  in  most  cases  impossible  for  even 
a  slightly  complicated  function  G.  On  the  other  hand  NLGMR,  the  nonlinear  GMRES,  requires 
no  explicit  Jacobian;  it  requires  only  the  subroutine  that  computes  G(u)  for  a  given  u. 

2.  Status  of  the  Research  Effort 

2.1.  NLPACK:  a  general  purpose  nonlinear  system  solver 

NLPACK,  our  projected  package,  has  two  main  branches.  The  first  is  the  Jacobian-free  part 
of  the  package,  which  contains  the  general  purpose  accelerator.  The  second  is  the  Newton-GMRES 
package  using  explicit  or  internally  computed  Jacobians.  In  Phase  I  we  have  concentrated  on  the 
first  branch  because  it  constitutes  the  more  original  and  important  work.  Both  branches  however, 
will  utilize  the  same  NLGMR  subroutine  as  a  nonlinear  solver,  which  is  described  next. 

2.1.1.  NLGMR:  A  nonlinear  version  of  GMRES 

NLGMR  is  a  nonlinear  extension  of  GMRES,  a  method  devised  for  solving  linear  equations. 
It  can  be  viewed  as  Newton’s  method  using  the  linear  GMRES  for  solving  the  Jacobian  system. 
This  constitutes  the  kernel  of  NLPACK. 

In  what  follows  we  are  interested  in  solving  the  nonlinear  system 

F(u)  —  0,  (2.1) 

where  F  is  a  nonlinear  application  from  R^  to  R^.  Throughout  we  denote  by  ||.||  the  Euclidean 
norm  in  RN ,  and  Jf(u)  the  Jacobian  matrix  of  the  application  F  at  the  point  u.  When  there  is  no 
ambiguity,  Jp(u)  will  be  denoted  by  Jp. 

Suppose  that  u„  is  the  current  approximation  and  that  we  wish  to  find  a  new  approximation 
of  the  form  un+i  =  u„  -f  6.  In  the  nonlinear  version  of  GMRES  [18),  the  vector  6  is  written  in  the 
form 

m 

<5  =  a, v,-,  (2.2) 

i=i 

where  the  v,  ’s  are  m  orthonormal  vectors,  the  determination  of  which  will  be  clarified  shortly,  and 
the  or’s  are  unknowns  to  be  determined.  Ideally  we  would  want  to  minimize 

m«n  +  «)|| 

over  all  vectors  6  to  get  the  new  iterate  un+j  =  u„  +  6.  Unfortunately,  this  would  lead  to  a 
numerically  difficult  minimization  problem.  Instead,  we  can  linearize  F  about  the  point  u„  by 
writing  F(un  +  6)  &  F(un)  +  Jp6  where  Jp  represents  the  Jacobian  of  F  at  u„,  and  we  minimize 

\\F(un)  +  Jp6\\  (2.3) 
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(1)  Start:  Choose  Uq  and  a  dimension  m  of  the  Krylov  subspace.  Set  n  =  0. 

(2)  Arnoldi  process: 

•  Compute  =  JJf  («,, ) JJ  and  iq  =  F{ u„)/fi. 

•  For  j  =  1,2, ..,  m  do: 


hij  =  {JFVj,Vi),  i  =  1,2 

i 

Vj+1  =  ^Fvj  ~~  'y  '.hjjVj, 

i'=i 

fti+U  =  ll*V+ill>  and 
v>+i  =  *y+i/^>+u- 

Define  to  be  the  (m  +  1)  x  m  (Hessenberg)  matrix  whose  nonzero  entries  are  the 
coefficients  fii>.  1  <  *  <  j  +  1,  1  <  j  <  m  and  define  Vm  =  [iq,  v?, . . .  um] 

(3)  Form  the  approximate  solution: 

•  Find  the  vector  ym  that  minimizes  the  function  q(y)  =  ||/?ei  -  Hmy ||,  where  e\  — 
[1,0, . .  .0]7’,  among  all  vectors  y  of  Rm. 

•  Compute  6n  =  Vmym  and  un+1  =  «„  +  6n. 

(4)  Restart:  If  satisfied  stop,  else  set  u„  *—  u„+i  ,  n  <—  n  +  1,  and  goto  (2). 

Algorithm  1:  NonLinear  Generalized  Minimal  Residual  (NL- 
GMR) 

over  all  vectors  6  of  the  form  (2.2).  The  above  minimization  can  be  achieved  by  performing  m  steps 
of  the  GMRES  algorithm  [15],  applied  to  the  linear  equation 


=  —F(un),  (2.4) 

starting  GMRES  with  the  initial  solution  =  0.  Notice  that  solving  this  equation  exactly  will 
yield  the  Newton  direction  JflF(un).  Initially,  we  assume  that  the  Jacobian  Jf  is  available  and 
will  temporarily  not  concern  ourselves  with  the  way  in  which  it  is  computed. 

Algorithm  1  is  a  restarted  version  of  the  GMRES  algorithm  which  at  every  outer  iteration 
generates  the  orthonormal  vectors  v,  ,i  =  1,2, ...  m  - 1,  and  then  builds  the  vector  <5„  that  minimizes 
||f:i(«ri)  +  In  GMRES  the  u,’s  are  computed  such  that  they  form  an  orthonormal  basis  of  the 

Krylov  subspare  span  {i>\,  JfV\, . . . ,  v\  where  iq  is  obtained  by  normalizing  F(ur)). 
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Steps  2  and  3  of  Algorithm  1  are  precisely  the  GMRES(m)  process  for  solving  the  linear  system 
Jf&  =  with  the  particular  initial  guess  =  0  (see  (15|) .  With  the  standard  notation 

ro  for  the  initial  residual  of  the  above  linear  system  (here  ro  =  F(un))  each  outer  loop  of  the 
above  algorithm,  consisting  of  steps  2,  3,  and  4,  is  divided  in  two  main  stages.  The  first  stage 
is  an  Arnoldi  loop  that  builds  an  orthonormal  basis  Vm  =  [vt,  t>2,  •  •  • ,  thn]  of  the  Krylov  subspace 

Km  =  span  {ro,  Jr o,  J^ro . Jm-1ro}.  If  we  denote  by  Vj  the  N  x  j  matrix  with  column  vectors 

vi » vj, . . .  Vj,  and  by  Hm  the  (m  +  1)  x  m  upper  Hessenberg  matrix  whose  nonzero  entries  are  the 
hij,  0  <  t  <  j+  1,  1  <  j  <  m,  then  it  is  easy  to  establish  the  following  relation  [15]  which  is  crucial 
in  the  development  of  GMRES, 

=  I^m+l  •  (2-5) 

Step  (3)  of  NLGMR  computes  in  the  subspace  Km  the  approximate  solution  6n  that  minimizes  the 
residual  norm  ||Jf6  +  F(un)||.  Writing  6  =  Vmy,  where  y  is  in  an  m-vector,  and  observing  that  by 
definition  of  iq  we  have  F(un)  =  f3v\  we  see  that  y  must  minimize 

||Vmy  +  &v\ ||  =  ||Vm+i [Hmy  -  ^ei]||. 

This  explains  the  least  squares  problem  of  size  m  +  1,  with  upper  Hessenberg  matrix  Hm  in  step 
(3)  of  the  algorithm.  Then  u„+1  is  computed  at  the  end  of  step  (3),  by  adding  6„  the  optimal  6 
found.  However,  when  damping  is  included  this  will  be  replaced  by 

Wn+l  =  "b 

where  the  damping  coefficient  will  be  computed  by  some  damping  subroutine,  to  be  called  from 
NLGMR. 

For  simplicity,  we  have  omitted  several  details  on  the  practical  implementation  in  the  above 
presentation,  which  are  discussed  at  length  in  [15].  For  example,  in  practice  one  computes  progres¬ 
sively  the  least  squares  solution  ym  in  the  successive  steps  j  —  1, . .  .,m  of  stage  2.  Thus,  at  every 
step,  after  this  least  squares  solution  is  updated,  we  obtain  the  residual  norm  of  the  corresponding 
approximate  solution  xt  without  having  to  actually  compute  it  (see  [15]).  This  allows  us  to  stop 
at  the  appropriate  step. 

The  GMRES  algorithm  is  theoretically  equivalent  to  GCR  [5]  and  to  ORTHODIR  [10]  but 
is  less  costly  in  terms  of  storage  and  arithmetic  [15].  For  a  synthesis  and  general  description  of 
available  conjugate  gradient  type  methods  see  [14].  A  comparison  of  the  cost  of  each  step  of  these 
algorithms  shows  that  for  large  enough  m,  GMRES  costs  about  1/3  less  than  GCR/ORTHOMIN 
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in  arithmetic  while  storage  is  roughly  divided  by  a  factor  of  two.  Another  appealing  property  of 
the  algorithm  is  that  in  exact  arithmetic,  the  method  does  not  break  down  or,  to  be  more  accurate, 
that  it  breaks  down  only  when  it  delivers  the  exact  solution  [15]. 

Restarting  the  algorithm  as  is  done  in  step  (4)  corresponds  to  taking  a  new  Newton  step. 
In  fact  if  the  GMRES  algorithm  solves  the  linear  system  Jy&  =  —F(un)  exactly,  or  rather  with 
sufficient  accuracy  by  taking,  for  example,  m  sufficiently  large,  then  it  is  clear  that  the  above 
algorithm  is  nothing  but  Newton’s  method,  in  which  the  Jacobian  linear  systems  are  solved  by 
GMRES.  Thus,  the  above  algorithm  constitutes  a  basic  Newton/GMRES  method.  It  constitutes 
the  kernel  of  the  proposed  package. 

Perhaps  one  of  the  most  important  aspects  of  the  above  GMRES  iteration  is  that  the  Jacobian 
matrix  Jy  need  not  be  explicitly  available.  The  only  operations  on  the  Jacobian  matrix  Jy  that 
are  required  from  Arnoldi’s  method  are  matrix-vector  multiplications  w  =  Jyv,  which  can  be 
approximated  by 


Jy(u)v 


F(u  +  hv)  -  F(u) 
h 


(2.6) 


where  u  is  the  point  where  the  Jacobian  is  being  evaluated  and  h  is  some  carefully  chosen  small 
scalar.  The  idea  of  exploiting  the  above  approximation  is  not  new  and  was  extensively  used  in 
the  context  of  ODE  methods  [4,  7,  3,  2],  in  eigenvalue  calculations  [6],  and  is  quite  common  in 
nonlinear  equation  solution  methods  and  optimization  methods  (see,  for  example,  [9,  11,  18]). 


2.1.2.  Using  NLGMR  for  accelerating  a  basic  iteration 

There  are  numerous  scientific  codes  in  which  the  most  time  consuming  loop  of  a  program  is 
spent  performing  an  iteration  of  the  form 


Un+l  =  M(un). 


The  above  scheme  can  be  any  stationary  iterative  method  such  as  nonlinear  Gauss-Seidel,  SSOR, 
line  Jacobi,  Multigrid  sweeps,  or  other  particular  schemes.  It  is  important  to  observe  that  this 
iteration  is  only  attempting  to  solve  the  nonlinear  equation  F(u)  —  u  -  M (u)  =  0.  Since  NLGMR 
only  requires  function  evaluations,  it  can  be  applied  to  solving  F(u)  =  0  with  no  particular  difficulty 
by  using  the  user-supplied  subroutine  performing  one  step  of  the  above  basic  iteration. 

In  typical  cases  the  basic  iterative  methods  have  been  tested  and  fine-tuned  for  several  years 
on  a  particular  problem  for  which  it  is  specifically  designed.  Such  examples  abound  in  engineering. 
We  have  considered  two  such  applications  from  combustion  simulations  and  fluid  dynamics  that 
are  described  in  Section  2.3. 
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2.1.3.  Techniques  to  improve  global  convergence  properties 

As  is  well  known,  the  main  disadvantage  of  Newton’s  method  is  that  it  only  converges  when 
the  initial  guess  is  close  to  the  solution.  There  are  basically  two  classes  of  methods  for  preventing 
divergence.  The  first  is  a  combination  of  damping  and  trust  regions  techniques  and  the  second 
is  homotopy.  Both  methods  should  be  implemented  in  a  general  purpose  code.  In  our  feasibility 
study,  we  implemented  elementary  versions  of  both  approaches.  However,  we  consider  these  im¬ 
plementations  rudimentary;  substantial  effort  should  be  spent  in  perfecting  them  for  the  proposed 
software  package  because  they  are  among  the  most  important  factors  of  efficiency. 

Damping 

A  traditional  method  for  improving  reliability  in  Newton’s  method  is  to  replace  the  usual 
iteration  by  the  so-called  damped  Newton  iteration 

ttfi+i  =  “1”  (2-7) 

where  6„  =  -J^lF(un)  and  where  the  scalar  an  is  chosen,  for  example,  to  minimize  ||F(un  +  a6„)|| 
over  a  €  [0, 1],  This  requires  a  line  search  and  may  be  time  consuming  to  compute.  However,  more 
practical  mid-way  alternatives  can  be  used.  A  simple  strategy  is  to  start  with  <*  =  1  and  check  if 
the  corresponding  ||F(u„+1)||  shows  a  sufficient  decrease  as  compared  to  ||F(u„)||.  If  this  is  not  the 
case  we  halve  a  and  repeat  the  test  until  a  decrease  of  the  norm  is  achieved.  We  implemented  this 
strategy  in  our  preliminary  version  of  NLGMR.  Under  very  mild  conditions,  there  is  always  an  a 
sufficiently  small  for  which  the  norm  decreases  and,  although  global  convergence  is  not  guaranteed 
by  the  theory,  it  is  likely  to  be  achieved  in  practice  [17,  12].  It  is  also  easy  to  show  that,  even  when 
the  linear  systems  are  solved  only  approximately  by  reducing  the  initial  residual  norm  ||/,(un)||  by 
a  ratio  <1,  it  is  possible  to  achieve  a  decrease  in  the  norm  ||F(un  +  c*6„)|j  for  a  small  enough. 

Modified  Homotopy 

A  simple  but  attractive  form  of  improving  global  convergence  properties  of  NLGMR  is  a 
technique  borrowed  from  [18]  that  can  be  formulated  as  follows  for  the  case  where  F(u)  =  u-AI(u). 
At  step  n  +  1  choose  a  parameter  A  and  solve  the  following  equation  with  respect  to  u  to  get  u„+i: 

G(u,A)  =  A(u  -  M(u))  -f  (1  -  A)(u  -  M(u„))  =  0.  (2.8) 

Observe  that  G(u,  1)  =  u-M(u).  The  principle  of  this  damping  method  is  to  start  with  Ao  =  0,  for 
which  the  solution  of  (2.8)  is  trivial,  and  then  drive  A  to  1  while  solving  G(u,  A)  =  0  for  each  new 
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value  of  A  using  the  previous  un  as  the  initial  guess  to  the  new  system  G(u ,  A)  =  0.  This  has  proven  a 
simple  though  effective  technique  for  improving  global  convergence  behavior.  We  have  implemented 
several  different  strategies  for  varying  A.  The  most  promising  is  to  take  a  conservative  approach  at 
first  and  then  increase  A  according  to  the  decrease  obtained  in  the  residual  norm  ||u  —  A/(u)||  of 
two  successive  iterates.  This  ensures  that  super-linear  convergence  still  holds  at  the  limit. 

We  have  put  more  effort  in  developing  homotopy  techniques  than  damping  techniques. 

2.2.  Current  Software 

In  this  section  we  briefly  describe  the  software  developed  during  Phase  I  of  the  project.  We 
briefly  describe  the  package  as  it  stands  and  illustrate  its  usage  on  a  few  benchmark  problems. 
Before  starting  we  would  like  to  point  out  a  few  predominant  features  of  the  current  package.  First, 
we  have  maintained  a  high  level  of  modularity.  This  is  helpful  not  only  to  the  prospective  user 
but  also  to  us  when  debugging  and  testing.  Second,  we  have  maintained  generality  and  flexibility 
uy  making  many  of  the  features  optional,  to  be  turned  on  and  off  at  will.  For  example,  we  may 
run  the  code  without  the  homotopy  process  or  with  the  more  standard  damping  technique  or  a 
combination  of  both. 

Finally,  we  should  stress  that  we  plan  to  build  a  package  that  can  be  used  either  as  a  nonlinear 
solver  or  as  an  accelerator  of  existing  iterative  processes.  Thus,  NLPACK  should  be  able  to  ac¬ 
commodate  and  take  advantage  of  a  variety  of  the  user-provided  information.  It  should  also  easily 
interface  with  a  wide  range  of  different  problems. 

2.2.1.  Jacobian-free  part  of  NLPACK 

As  we  indicated  earlier,  we  were  able  to  solve  relatively  hard  problems  in  a  short  period  of  time 
and  with  a  minimum  of  programming  effort  by  using  the  Jacobian-free  approach.  We  concentrated 
most  of  our  effort  in  Phase  I  on  this  part  of  the  package.  A  short  description  follows. 

•  An  easy  to  follow  model  driver  model  and  corresponding  data  file.  At  present  this  is  where 
the  user  allocates  workspace,  chooses  parameters  and  selects  various  options  for  damping  and 
modified  homotopy.  Also  selected  are  the  dimension  of  the  Krylov  space,  the  parameters  dealing 
with  the  repetition  of  the  preconditioning,  and  different  tolerances  for  stopping  the  iterations. 
The  user  also  chooses  the  preconditioning  by  declaring  different  external  subroutines,  method, 
stopping  routine,  output  routine,  damping  routine,  homotopy  routine  and  declares  the  user 
supplied  function  routine  A/  or  F. 

•  NLGMR.  This  is  the  main  subroutine  of  the  Jacobian-free  part  of  NLPACK.  It  is  called  from 
the  driver  and  in  turn  calls  the  preconditioner,  i.e.  an  iterative  process  to  accelerate  by  GMRES 
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as  described  in  Section  2.1.2.  The  user  can  supply  his  own  preconditioner  or  use  one  of  three 
supplied  by  the  package. 

•  Three  general  purpose  preconditioners :  SSOR,  nonlinear  Jacobi,  and  functional  iteration  (which 
includes  the  case  of  no  preconditioning).  If  the  user  chooses  to  use  one  of  these,  he  must  provide 
the  function  subroutine,  which  is  called  from  these  preconditioners.  These  are  needed  in  a  scalar 
form,  i.e.  the  individual  components  are  required. 

•  Subroutines  for  damping,  modified  homotopy,  stopping,  output.  The  user  can  replace  any  of 
these  routines  by  his  own. 

2.2.2.  Newton/PCGPAK  part  of  NLPACK 

We  wrote  a  sample  program  to  solve  a  function  arising  from  a  five  point  operator,  using 
Newton’s  methods  and  the  existing  package  PCGPAK.  The  user  supplies  both  the  function  and 
the  Jacobian  matrix  in  either  Yale  Sparse  Matrix  format  or  in  diagonal  format.  The  user  has  the 
option  of  switching  on  damping  in  this  program.  This  is  only  a  preliminary  version  of  the  Newton 
code  we  plan  to  provide  in  this  part  of  the  package. 

2.3.  Numerical  Results 

It  was  important  to  test  NLPACK  on  a  variety  of  realistic  problems  —  ranging  from  easy  to 
hard  —  issued  from  real  applications.  We  selected  the  following  test  problems. 

Problem  number  1 

This  problem  is  a  simple  nonlinear  partial  differential  equation  of  an  elliptic  type,  with  a 
parameter  c  that  can  be  changed  to  vary  the  degree  of  nonlinearity  in  the  equation: 

-Au  +  c  u(ux  +  uv)  =  f 

on  the  unit  square,  and  u  =  g  on  the  boundary. 

Problem  number  2 

This  problem  is  similar  to  Problem  1,  but  it  has  an  exponential-type  nonlinearity: 

Au  +  c  eu  =  0 

on  the  unit  square,  and  u  =  0  on  the  boundary.  Again  the  parameter  c  can  be  varied  to  change 
the  difficulty  of  the  problem. 
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Problem  number  3:  Incompressible  viscous  flow  calculations 

The  flow  of  an  incompressible  viscous  fluid  in  a  cavity  driven  by  a  uniformly  moving  boundary 
has  been  successfully  calculated  in  our  Phase  I  project  for  relatively  high  Reynolds  numbers.  The 
driven  cavity  problem  has  received  a  great  deal  of  attention  in  the  computational  fluid  dynam¬ 
ics  literature  as  a  test  problem  for  comparing  numerical  techniques  for  solving  the  Navier-Stokes 
equations  (see,  for  example,  (1,  8,  16]  and  the  references  therein). 

We  have  used  the  stream  function/vorticity  function  form  of  the  two-dimensional  Navier-Stokes 
equation  : 

Aw  —  R^yW,  —  '&IWy|  =  0 

A¥  +  w  =  0, 

on  a  rectangle,  where  'Jr  is  the  stream  function,  and  w  is  the  vorticity.  Boundary  conditions  were 
obtained  by  translating  those  obtained  from  the  primitive  equations  which  use  the  velocities  and 
the  pressure  as  unknowns.  These  conditions  are 

*(*,0)  =  *„(x,0)  =0 

*(0,y)  =  ¥*(0,^  =  0 

¥(l»y)  =  ¥*(l,y)  =  0 

¥{x,l)  =  0 

¥y(X,  1)  =  1. 

From  our  experience  in  Phase  I,  implementing  a  Jacobian  method  such  as  NLGMR/SSOR  is 
much  easier  than  a  Newton-type  technique  using  the  Jacobian.  This  illustrates  one  of  the  main 
advantages  of  NLGMR:  the  programmer’s  task  is  simple  because  Jacobians  are  not  needed  and  not 
manipulated.  Our  tests  on  the  driven  cavity  problem  have  been  a  relatively  simple  task,  compared 
to  what  we  can  expect  from  implementing  a  multigrid  code,  for  example. 

Problem  number  4  :  Acceleration  of  the  TEA  CH  combustion  code 

TEACH  is  a  multidimensional  steady-state  compressible  flow  algorithm  based  on  a  low-order 
control-volume  discretization  of  the  primitive  variable  equations  in  conservation  form  over  orthog¬ 
onal,  tensor-product  grids  [13].  In  the  applications  for  which  it  is  best  suited,  a  “lumped”  chemical 
reaction  of  Arrhenius  form  is  postulated,  involving  only  a  handful  of  species.  The  nonlinearities 
arise  directly  from  the  convective  terms  and  the  reaction  source  terms,  and  indirectly  through 
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temperature-,  pressure-,  and  composition-dependent  laminar  transport  properties  and  thermody¬ 
namic  coefficients,  and  the  velocity-depend  at  turbulent  transport  properties.  A  multistage  varia¬ 
tion  of  the  block  Gauss-Seidel  method  is  used  to  solve  this  nonlinear  system.  The  outermost  stage 
consists  of  cycling  between  a  Poisson-like  pressure  correction  equation  (derived  from  a  truncated 
substitution  of  the  discrete  momentum  equations  into  the  discrete  continuity  equation,  which  is 
thus  eliminated)  and  the  transport  equations  for  all  of  the  other  unknown  fields.  Within  this  latter 
block,  the  equations  are  relaxed  cyclically,  field-by-field.  Within  the  sub-blocks  at  the  level  of  the 
individual  fields,  the  updates  are  generally  computed  in  a  block-line  fashion  (although  pointwise 
relaxation  schemes  have  also  been  applied),  so  that  an  elemental  tridiagonal  matrix  solver  is  the 
largest  implicit  aggregate  in  the  overall  calculation.  In  practice,  under- relaxation  of  the  updates 
is  necessary.  Variations  in  the  cyclic  order,  in  the  sequence  of  convergence  tolerances  imposed  on 
intermediate  iterates,  and  in  the  frequency  of  the  updating  of  the  nonlinear  coefficients  abound, 
depending  upon  the  application. 

A  parallel  project  at  Scientific  Computing  Associates,  also  funded  by  AFOSR,  is  to  improve  the 
TEACH  code  by  incorporating  state-of-the  art  nonlinear  methods.  NLGMR  is  one  of  the  methods 
considered.  Some  preliminary  tests  have  shown  NLGMR  to  be  marginally  successful,  i.e.  savings 
of  the  order  20%  to  28%  have  been  realized. 

2.3.1.  Solution  of  the  driven  cavity  problem 

In  Figures  1  and  2  we  plotted  the  output  from  NLGMR  applied  to  the  driven  cavity  problem 
on  a  16  x  16  grid  on  the  unit  square,  with  Reynolds  number  100.  These  plots  agree  well  with  plots 
obtained  by  researchers  using  other  methods  (1,  8,  16).  NLGMR  was  able  to  solve  the  problem  with 
higher  Reynolds  numbers  and  more  grid  points:  our  largest  run  was  R  =  100  and  a  mesh  of  size 
64  x  64.  The  only  place  where  NLGMR  seems  to  need  improvement  is  in  its  efficiency.  For  example, 
another  way  of  solving  this  problem  is  to  use  the  multigrid  method,  which  turn  out  to  be  more 
efficient  than  our  present  version  of  NLGMR.  On  the  other  hand,  the  multigrid  method  requires  a 
much  greater  coding  effort  and  is  less  suitable  for  a  general  purpose  package.  In  fact,  we  can  use  an 
existing  multigrid  code  as  a  preconditioner  for  NLGMR,  and  we  expect  that  NLGMR/multigrid  will 
accelerate  the  multigrid  solution  of  this  problem  for  many  grid  points  and  large  Reynolds  numbers. 

2.3.2.  NLGMR  as  an  accelerator 

In  Figure  3  we  illustrate  NLGMR  as  an  accelerator.  For  this  diagram  we  solved  the  driven 
cavity  problem  on  a  16  x  16  mesh  on  the  unit  square  to  an  accuracy  of  1.0d-6.  We  made  the  problem 
increasingly  difficult  by  increasing  the  Reynolds  number  from  1.0  to  600  and  measured  the  efficiency 
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of  the  method  by  counting  the  number  of  function  calls.  We  compared  SSOR  with  NLGMR 
applied  to  accelerate  (and  improve)  SSOR.  The  results  confirmed  our  beliefs  about  NLGMR  as  an 
accelerator,  and  provided  data  about  NLGMR  as  an  enhancer  of  other  methods.  Indeed,  SSOR 
broke  down  at  Reynolds  numbers  higher  than  150.  NLGMR/SSOR  performed  better  than  SSOR 
whenever  SSOR  solved  the  problem.  In  addition,  NLGMR/SSOR  solved  problems  which  SSOR 
could  not  handle. 

It  is  interesting  to  remark  that  for  Reynolds  numbers  higher  than  400  the  convergence  of 
NLGMR  was  becoming  slow.  We  switched  on  some  of  the  modified  homotopy  options,  without 
much  effect  (although  these  turned  out  to  greatly  reduce  costs  on  other  problems).  Then  we 
tried  slightly  increasing  the  maximum  dimension  of  the  Krylov  space  and  we  switched  on  the 
damping  option;  these  options  sped  up  the  convergence  rate  of  NLGMR.  This  example  illustrates 
the  importance  of  having  easy  access  to  the  different  parameters. 


2.3.3.  Some  more  tests  on  the  usage  of  NLPACK 

In  Figure  4  we  illustrate  the  usage  of  both  branches  of  NLPACK.  We  solved  Problem  1  on  a 
20  x  20  grid  on  the  unit  square,  with  c  =  5.0.  We  plotted  the  norm  of  the  residual  against  CPU 
time  and  the  number  of  function  calls  because  function  evaluations  are  inexpensive  in  this  case. 
Indeed,  for  NLGMR  the  CPU  time  is  almost  proportional  to  the  number  of  function  calls,  but 
for  Newton/PCGPAK  this  is  no  longer  true.  Naturally  Newton/PCGPAK  using  GMRES  as  an 
iterative  method  performs  better  because  it  takes  advantage  of  the  knowledge  of  the  Jacobian  and 
its  structure.  For  the  sake  of  comparison,  we  made  the  somewhat  arbitrary  decision  to  count  each 
Jacobian  call  as  nz  scalar  function  calls,  where  nz  is  the  number  of  non-zero  entries  in  the  Jacobian 
matrix. 

We  draw  two  conclusions  from  this  experiment.  First,  the  performance  of  NLGMR  compares 
well  with  other  methods  (for  some  problems  it  might  be  well  worth  paying  the  extra  time  for  not 
having  to  provide  and  store  the  Jacobian).  Second,  providing  the  Newton/PCGPAK  part  of  the 
package  is  worthwhile  because  one  might  get  substantial  improvements  if  the  Jacobian  is  available. 

In  Figure  5  we  compare  differ  preconditioners  in  solving  problem  number  2  for  c  =  5.0 
on  a  20  x  20  grid  on  the  unit  svm  We  compare  the  preconditioners  currently  available  and  a 
preconditioner  from  a  user-supniied  r:  :thod  to  solve  the  problem  u„+i  =  -cA^1(exp(«„)).  For 
this  problem  the  user-supplied  Pica*-'  type  method  is  very  efficient.  Indeed,  it  performs  better 
than  all  the  other  preconditioners.  We  can  also  see  that,  as  expecved,  SSOR  does  better  than  the 
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other  preconditioners.  The  Jacobi  preconditioner  does  not  improve  over  no  preconditioning  for  this 
problem.  Once  again  this  emphasizes  the  importance  of  modularity  in  NLPACK. 

2.3.4.  Other  numerical  experiments  and  work  in  progress 

We  experimented  with  repeating  the  preconditioner  (or  the  user  method  if  that  is  the  case). 
For  some  problems  it  pays  replacing  M  from  u„+i  =  M(un)  by  Mk  for  k  in  the  range  2  to  5. 
However,  for  larger  values  of  k  the  method  deteriorates  in  that  the  total  number  of  iterations  in 
GMRES  increases. 

We  also  tested  different  strategies  for  modified  homotopy.  We  concluded  that  homotopy  is  very 
useful  for  some  problems.  Moreover,  it  makes  a  difference  how  fast  and  when  A  is  decreasing  (right 
now  we  have  five  strategies  for  choosing  A).  At  present,  modified  homotopy  has  the  drawback 
of  being  based  on  the  user’s  intuitive  feeling  or  heuristic  thinking.  For  this  reason  we  plan  to 
implement  a  new  homotopy  technique  that  generalizes  the  current  modified  homotopy  method. 
This  will  give  the  user  the  choice  of  switching  on  the  automatic  choice  of  A  and/or  restarting  or 
choosing  it  according  to  his  knowledge  of  the  problem. 

At  the  time  of  this  writing,  we  continue  the  development  of  NLPACK  on  several  fronts.  We  are 
almost  finished  implementing  the  homotopy  in  NLGMR  as  explained  above.  We  are  now  working  on 
turning  the  sample  Newton/PCGPAK  program  into  a  working  tool,  and  we  have  started  providing 
a  few  routines  to  interface  some  standard  formats  for  user-provided  Jacobians  with  the  Yale  Sparse 
Matrix  format  (used  by  PCGPAK).  Finally,  we  are  expanding  the  pool  of  benchmark  problems. 

2.4.  Conclusion 

In  a  relatively  short  time  we  have  been  able  to  write  a  short  version  of  our  package,  consisting 
mainly  of  the  Jacobian-free  part,  and  to  use  it  to  solve  a  few  benchmark  problems  that  demonstrate 
the  viability  of  the  proposed  code.  Perhaps  among  the  strongest  points  of  the  projected  package 
are  its  ease  of  use  and  the  variety  of  proposed  techniques. 

It  is  important  to  realize  that  this  is  only  preliminary  work  and  that  there  are  many  issues  that 
will  be  raised  concerning  the  performance  improvement  of  the  method.  Among  them  are  at  least 
two  important  issues  that  will  require  substantial  programming  effort  and  that  will  undoubtedly 
improve  the  effectiveness  of  the  current  NLGMR  package 

1.  Scaling  problems.  Variables  having  widely  different  orders  of  magnitude  are  common.  In  these 
situations  it  is  clearly  important  to  rescale  the  variables  according  to  size.  This  was  stressed 
in  particular  by  Brown  and  Hindmarsh  (2)  who  propose  a  scaled  version  of  IOM,  a  method 
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related  to  GMRES.  Because  the  Jacobian  is  approximated  by  finite  differences,  a  bad  scaling 
can  lead  to  poor  performance. 

2.  Robust  homotopy  and  damping  techniques.  As  was  mentioned  earlier  there  are  elaborate 
damping  and  homotopy  methods  that  require  programming  efforts  that  go  beyond  a  six  month 
period.  These  include  trust  region  techniques  combined  with  steepest  descent  methods,  and 
the  more  standard  homotopy  in  place  of  the  modified  homotopy  method  implemented  in  our 
preliminary  package. 

3.  Publications 

None. 

4.  Personnel 

1.  Dr.  Mark  W.  Angevine,  Principal  Investigator 

2.  Dr.  Youcef  Saad 

3.  Dr.  Tony  F.C.  Chan 

4.  Dr.  Peter  Farkas 

5.  Interactions  (Coupling  Activities) 

None. 

6.  New  discoveries,  inventions,  or  patent  disclosures  and  specific  applications  stemming  from  the  research 

We  have  proven  the  feasibility  of  our  proposed  software  package  for  solving  general  nonlinear 
equations. 
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Contour  of  U>  for  R-100.0.  on  a  64  by  64  mesh 


Figure  2:  Contour  of  the  vorticity  in  the  driven  cavity  prob¬ 
lem  for  Re=100 
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SS0R  vs.  NLGMR+SS0R 


Figure  3:  Performances  of  SSOR  and  NLGMR/SSOR  for 
problem  number  1. 
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NLGMR+SS0R  vs.  Newton +GMRES+SS0R 


"a  Jacobian  matrix  call  -  nz  scalar  function  calls 


Figure  4:  Performances  of  NLGMR/SSOR  and  Newton/GMRES| 
preconditioned  by  SSOR  for  problem  number  1. 
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